Word-Sense Distinguishability and Inter-Coder Agreement
نویسندگان
چکیده
It. is common in NLP that the categories into which text is classified do not have fully objective definitions. Examples of such categories are lexical distinctions such as part-of-speech tags and wordsense distinctions, sentence level distinctions such as phrase attachment, and discourse level distinct.icms such as topic or speech-act categorization. This p>1per presents an approach to analy?-ing the agrcen1ent arnong lnnnan judges for the purpose of formulating a refined and more reliable set of category designations. We use these techniques to analyze the sense tags assigned by five judgps to the noun intcr·est. The initial tag set is takmi from Longman's Dictionary of Contemporary }i:nglish. Through this process of analysis, we automatically identify and assign a revised set of sense tags for the data. The revised tags exhibit high reliability as measured by Cohen's r;.. Such techniques are important for formulating and evaluating both human and automated classification systems.
منابع مشابه
What Determines Inter-Coder Agreement in Manual Annotations? A Meta-Analytic Investigation
Recent discussions of annotator agreement have mostly centered around its calculation and interpretation, and the correct choice of indices. Although these discussions are important, they only consider the “back-end” of the story, namely, what to do once the data are collected. Just as important in our opinion is to know how agreement is reached in the first place and what factors influence cod...
متن کاملCro36WSD: A Lexical Sample for Croatian Word Sense Disambiguation
We introduce Cro36WSD, a freely-available medium-sized lexical sample for Croatian word sense disambiguation (WSD). Cro36WSD comprises 36 words: 12 adjectives, 12 nouns, and 12 verbs, balanced across both frequency bands and polysemy levels. We adopt the multi-label annotation scheme in the hope of lessening the drawbacks of discrete sense inventories and obtaining more realistic annotations fr...
متن کاملبررس پایایی رادیولوژیست ها و عملکرد آنها در تشخیص وخامت توده های تخمدان از روی سونوگرافی
Background: Intra-rater agreement in observing and decision making in diagnosis of any disease is of great importance.This investigation is to observe and read ultrasound pictures of ovarian cysts and distinguish its category for any radiologist. Distinguishability is one of the related entities in this matter and radiologists;apos ability in correct diagnosis is of great concern. In this study...
متن کاملA Multi-domain Corpus of Swedish Word Sense Annotation
We describe the word sense annotation layer in Eukalyptus, a freely available five-domain corpus of contemporary Swedish with several annotation layers. The annotation uses the SALDO lexicon to define the sense inventory, and allows word sense annotation of compound segments and multiword units. We give an overview of the new annotation tool developed for this project, and finally present an an...
متن کاملThe MASC Word Sense Sentence Corpus
The MASC project has produced a multi-genre corpus with multiple layers of linguistic annotation, together with a sentence corpus containing WordNet 3.1 sense tags for 1000 occurrences of each of 100 words produced by multiple annotators, accompanied by indepth inter-annotator agreement data. Here we give an overview of the contents of MASC and then focus on the word sense sentence corpus, desc...
متن کامل